Skip to content

perf: use hybrid sort for inline object order#855

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/hybrid-inline-sort-order
Open

perf: use hybrid sort for inline object order#855
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/hybrid-inline-sort-order

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 13, 2026

Motivation

computeSortedInlineOrder was originally tuned for inline objects with a
handful of fields. Once strict JSON imports started constructing inline
Val.Objs from byte-parsed JSON, the wider key counts of imported objects
(kube-prometheus and similar configs) turned the existing insertion sort
into a quadratic hot spot.

A repeated kube-prometheus materialization sample showed
Materializer.computeSortedInlineOrder as a real Scala-Native top-stack
sample. This PR keeps the small-object fast path and breaks the quadratic
behaviour for wider objects.

Modification

  • Materializer.computeSortedInlineOrder delegates to a new
    sortInlineOrder dispatch:
    • len ≤ 1: return.
    • len ≤ 16: existing insertion sort over the index array.
    • len > 16: in-place quicksort with median-of-three pivot, falling back
      to insertion sort once partitions reach ≤ 16. Recurses on the
      smaller half (Sedgewick) so stack depth is O(log n).
  • Sorting still uses Util.compareStringsByCodepoint — Jsonnet key ordering
    semantics are unchanged.
  • Only a fresh Array[Int] is mutated; shared parsed keys/members are not
    touched.

Result

Re-benched on 2026-05-21 against master @ b252b184. Apple Silicon, JDK 21,
Scala 3.3.7.

Allocation (JMH -prof gc, full bench corpus)

In-place sort, so allocation is unchanged. Every bench is within
±0.3% B/op except manifestJsonEx at -1.70% (which is genuine —
the smaller index-array path skips the temporary key list the old
helper kept producing on this shape). No bench shows an alloc
regression > +0.3% / +250 B.

Wall-clock — Scala-Native release binary (hyperfine)

Selected object-construction-shape benches (warmup=2, min-runs=5):

bench                                  master ms     this PR ms      Δ
cpp_suite/realistic2.jsonnet           87.38 ± 1.51  86.50 ± 1.64   -1.00%
cpp_suite/bench.02.jsonnet             60.78 ± 1.32  59.83 ± 1.44   -1.56%
sjsonnet_suite/lazy_array_compr.       92.25 ± 3.25  92.05 ± 2.51   -0.22%
cpp_suite/realistic1.jsonnet           10.70 ± 1.18  10.68 ± 1.17   -0.19%
cpp_suite/gen_big_object.jsonnet       10.04 ± 1.19  10.45 ± 2.85   +4.09%
cpp_suite/large_string_template.jsonnet 13.40 ± 4.86 10.89 ± 1.20  -18.71%
go_suite/manifestJsonEx.jsonnet         6.79 ± 1.28   6.19 ± 1.02   -8.78%
go_suite/manifestTomlEx.jsonnet         6.18 ± 1.17   5.78 ± 1.05   -6.48%

Bench corpus impact is largely wall-clock-neutral: most short-running
benches (< 30 ms) are dominated by Native start-up variance (±10–15 %
run-to-run). The targeted win — wide inline JSON objects from imports —
is not represented in the bench corpus; the original kube-prometheus
profile is where the change pays off most.

No corpus-level regression > 5 % outside Native start-up noise.

Correctness

  • RendererTests and JsonImportFastPathTests pass (13 + 7 cases).
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — green.
  • ./mill __.checkFormat — green.

Hybrid sort is correct: insertion sort on small partitions matches the
existing implementation; quicksort uses Hoare partition with
median-of-three pivot and tail-recursion on the smaller half (worst-case
stack O(log n)); object keys are unique so stability is not
required.

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — green
  • ./mill __.checkFormat — green

@He-Pin He-Pin marked this pull request as ready for review May 13, 2026 12:26
@He-Pin He-Pin marked this pull request as draft May 13, 2026 12:26
@He-Pin He-Pin force-pushed the perf/hybrid-inline-sort-order branch 3 times, most recently from 0848e79 to 4e987c7 Compare May 20, 2026 18:18
@He-Pin He-Pin marked this pull request as ready for review May 21, 2026 02:42
Motivation:
Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects.

Modification:
Keep insertion sort for small inline objects, and use an in-place quicksort with median-of-three pivot and insertion-sort cleanup for larger visible field sets.

Result:
Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass.

References:
Upstream-base: databricks/sjsonnet@cedc083
Prior optimization: 883fca5 perf: parse strict JSON imports from bytes
@He-Pin He-Pin force-pushed the perf/hybrid-inline-sort-order branch from ef717de to 5332110 Compare May 21, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant